Optimistic Bandit Convex Optimization

نویسندگان

  • Scott Yang
  • Mehryar Mohri
چکیده

We introduce the general and powerful scheme of predicting information re-use in optimization algorithms. This allows us to devise a computationally efficient algorithm for bandit convex optimization with new state-of-the-art guarantees for both Lipschitz loss functions and loss functions with Lipschitz gradients. This is the first algorithm admitting both a polynomial time complexity and a regret that is polynomial in the dimension of the action space that improves upon the original regret bound for Lipschitz loss functions, achieving a regret of e O T 11/16d3/8 . Our algorithm further improves upon the best existing polynomial-in-dimension bound (both computationally and in terms of regret) for loss functions with Lipschitz gradients, achieving a regret of e O T 8/13d5/3 .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regret Analysis for Continuous Dueling Bandit

The dueling bandit is a learning framework wherein the feedback information in the learning process is restricted to a noisy comparison between a pair of actions. In this research, we address a dueling bandit problem based on a cost function over a continuous space. We propose a stochastic mirror descent algorithm and show that the algorithm achieves an O( √ T log T )-regret bound under strong ...

متن کامل

An optimal algorithm for bandit convex optimization

We consider the problem of online convex optimization against an arbitrary adversary with bandit feedback, known as bandit convex optimization. We give the first Õ( √ T )-regret algorithm for this setting based on a novel application of the ellipsoid method to online learning. This bound is known to be tight up to logarithmic factors. Our analysis introduces new tools in discrete convex geometry.

متن کامل

Multi-scale exploration of convex functions and bandit convex optimization

We construct a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function. We use this map to solve a decadeold open problem in adversarial bandit convex optimization by showing that the minimax regret for this problem is Õ(poly(n) √ T ), where n is the dimension and T the number of rounds. This bound is ...

متن کامل

Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff

Bandit convex optimization is one of the fundamental problems in the field of online learning. The best algorithm for the general bandit convex optimization problem guarantees a regret of e O(T 5/6), while the best known lower bound is ⌦(T 1/2). Many attempts have been made to bridge the huge gap between these bounds. A particularly interesting special case of this problem assumes that the loss...

متن کامل

An Empirical Analysis of Bandit Convex Optimization Algorithms

We perform an empirical analysis of bandit convex optimization (BCO) algorithms. We motivate and introduce multi-armed bandits, and explore the scenario where the player faces an adversary that assigns different losses. In particular, we describe adversaries that assign linear losses as well as general convex losses. We then implement various BCO algorithms in the unconstrained setting and nume...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016